Advanced Statistical Inference II

Chelsea Parlett-Pelleriti

Predictive Checks

Predictive Distributions

🤔 How do we use distributions of parameters (prior or posterior) to make predictions?

Predictive Distributions

Two sources of variability:

  1. Distributional (Prior or Posterior) Variability

  2. Sampling (Data) Variability

Predictive Distributions

Distributional Variability

How certain are we about the value of the parameters?

Predictive Distributions

Sampling Variability

How much will observed data typically vary from its expected value? a.k.a. inherent variability in the data around the true value.

Predictive Distributions

\[ p(\tilde{x} \mid x) \]

Probability of new sample \(\tilde{x}\) given the current data \(x\).

  1. draw \(\theta_i \sim \pi(\theta)\) (one sample of all the model parameters)

  2. draw \(y_i \sim p(y_i \mid \theta_i)\) (sample of data from model implied by \(\theta_i\))

1 accounts for uncertainty about the parameters \(\theta\), 2 accounts for uncertainty in the sample

Predictive Distributions

Two sources of variability:

  1. Distributional (Prior or Posterior) Variability

  2. Sampling (Data) Variability

Prior Predictive Check

Prior predictive checks allow us to simulate samples from our prior and make sure it lines up with our expectations about reality.

# bayes
priors <- c(
  prior("normal(0,5)", class = "b"),
  prior("normal(0,4)", class = "Intercept"),
  prior("gamma(0.1, 10)", class = "sigma")
)
mod_0 <- brm(y ~ x, data = df, prior = priors, sample_prior = "only") 
mod_0 |> pp_check()

Prior Predictive Check

Let’s adjust our priors to be more realistic.

# bayes
priors <- c(
  prior("normal(0,1)", class = "b"),
  prior("normal(0,1)", class = "Intercept"),
  prior("gamma(1, 1)", class = "sigma")
)
mod_1 <- brm(y ~ x, data = df, prior = priors, sample_prior = "only")
mod_1 |> pp_check()

Posterior Predictive Check

Posterior predictive checks do something similar, but simulates new data based on the posterior. This allows us to “look for systematic discrepancies between real and simulated data” (Gelman et al. 2004) which tells us a little about model fit.

# bayes
priors <- c(
  prior("normal(0,1)", class = "b"),
  prior("normal(0,1)", class = "Intercept"),
  prior("gamma(1, 1)", class = "sigma")
)
m1 <- brm(y ~ x, data = df, prior = priors) 
m1 |> pp_check()

Posterior P-Values

Remember, p-values tell you:

given a distribution \(\pi(\theta)\), what is the probability of observing a \(\theta\) as or more extreme as the one we did? In other words, how consistent is \(\theta\) with \(\pi(\theta)\)?

# generate samples from posterior
yrep <- posterior_predict(m1, draws = 500)

# overall mean function
overall_mu <- function(x) mean(x)

# calc for actual data
overall_mu(df$y) 
[1] 0.236782
# calc for posterior samples
ppc_stat(df$y, yrep, stat = "overall_mu", binwidth = 0.005)

Bayesian Regression in R

mtcars Data

library(tidyverse)
data(mtcars)
glimpse(mtcars)
Rows: 32
Columns: 11
$ mpg  <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,…
$ cyl  <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,…
$ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16…
$ hp   <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180…
$ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,…
$ wt   <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.…
$ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18…
$ vs   <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,…
$ am   <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,…
$ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,…
$ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,…

General Model Setup

mod1 <- brm(mpg ~ hp + gear, data = mtcars) 
get_prior(mod1)
                   prior     class coef group resp dpar nlpar lb ub
                  (flat)         b                                 
                  (flat)         b gear                            
                  (flat)         b   hp                            
 student_t(3, 19.2, 5.4) Intercept                                 
    student_t(3, 0, 5.4)     sigma                             0   
       source
      default
 (vectorized)
 (vectorized)
      default
      default

General Model Setup

priors <- c(
  prior("normal(0,5)", class = "b")
)

mod2 <- brm(mpg ~ hp + gear, data = mtcars,
            prior = priors) 
get_prior(mod2)
                   prior     class coef group resp dpar nlpar lb ub
             normal(0,5)         b                                 
             normal(0,5)         b gear                            
             normal(0,5)         b   hp                            
 student_t(3, 19.2, 5.4) Intercept                                 
    student_t(3, 0, 5.4)     sigma                             0   
       source
         user
 (vectorized)
 (vectorized)
      default
      default

Prior Predictive Check

mod2_pp <- brm(mpg ~ hp + gear, data = mtcars,
            prior = priors, sample_prior = "only") 
mod2_pp |> pp_check()

Summary

summary(mod2)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: mpg ~ hp + gear 
   Data: mtcars (Number of observations: 32) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    18.09      3.38    11.37    24.85 1.00     4413     3011
hp           -0.06      0.01    -0.08    -0.05 1.00     4294     2990
gear          3.10      0.80     1.50     4.73 1.00     4682     3090

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     3.22      0.44     2.49     4.21 1.00     3242     2756

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Convergence

library(bayesplot)
summary(mod2)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: mpg ~ hp + gear 
   Data: mtcars (Number of observations: 32) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    18.09      3.38    11.37    24.85 1.00     4413     3011
hp           -0.06      0.01    -0.08    -0.05 1.00     4294     2990
gear          3.10      0.80     1.50     4.73 1.00     4682     3090

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     3.22      0.44     2.49     4.21 1.00     3242     2756

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
mcmc_trace(mod2)

Posterior Predictive Check

mod2 |> pp_check()

Posterior Examination

mcmc_areas(mod2,
           area_method = "scaled height",
           pars = c("b_hp",
                    "b_gear")) + 
  geom_vline(xintercept = c(-0.1,0.1),
             color = "red",
             linetype = "dashed") # ROPE

(more on ROPE with a brms model here)

Bayesian Logistic Regression in R

# s/o to Chat GPT for helping simulat the data
# libs
library(brms)
library(bayesplot)
library(tidybayes)

# sim data
set.seed(540)
n <- 200
parental_income <- rnorm(n, mean = 50, sd = 10) # income_in_k
z <- 1 + 0.05 * parental_income + rnorm(n, 0, 1)
p <- 1 / (1 + exp(-z))
passed_exam <- rbinom(n, 1, p) 
df <- data.frame(parental_income, passed_exam)

# priors
priors <- c(
  prior(normal(0, 5), class = "b"),
  prior(normal(0, 5), class = "Intercept")
)

# fit
fit <- brm(passed_exam ~ parental_income, data = df,
           family = bernoulli(), prior = priors,
           seed = 123, chains = 4, cores = 4, iter = 2000)

# summary
summary(fit)
 Family: bernoulli 
  Links: mu = logit 
Formula: passed_exam ~ parental_income 
   Data: df (Number of observations: 200) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept           1.30      1.69    -2.08     4.61 1.00     2293     2106
parental_income     0.04      0.03    -0.03     0.11 1.00     2132     1965

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
# convergence
mcmc_trace(fit) 

# pp_check
pp_check(fit)

# plot params
mcmc_areas(fit,
           pars = c("b_parental_income"))

Bayesian Mixed Effect Models

# s/o to Chat GPT for helping simulate the data
# Load required libraries
library(brms)
library(bayesplot)
library(tidybayes)

# sim
set.seed(123)
n_schools <- 10
n_students <- 20
total_n <- n_schools * n_students

school <- factor(rep(1:n_schools, each = n_students))
parental_income <- rnorm(total_n, mean = 50, sd = 10) # Parental income in thousands of dollars
school_effect <- rnorm(n_schools, 0, 5)[school]
individual_error <- rnorm(total_n, 0, 5)
exam_score <- 50 + 0.5 * parental_income + school_effect + individual_error

df <- data.frame(school, parental_income, exam_score)

# priors
priors <- c(
  prior(normal(0, 10), class = "b"),
  prior(normal(50, 10), class = "Intercept"),
  prior(exponential(1), class = "sd")
)

# fit
fit <- brm(exam_score ~ parental_income + (1 + parental_income | school),
           data = df, family = gaussian(), prior = priors,
           seed = 123, chains = 4, cores = 4, iter = 2000)

# summary
summary(fit)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: exam_score ~ parental_income + (1 + parental_income | school) 
   Data: df (Number of observations: 200) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~school (Number of levels: 10) 
                               Estimate Est.Error l-95% CI u-95% CI Rhat
sd(Intercept)                      1.03      0.97     0.03     3.66 1.00
sd(parental_income)                0.11      0.03     0.06     0.19 1.00
cor(Intercept,parental_income)    -0.02      0.55    -0.94     0.93 1.01
                               Bulk_ESS Tail_ESS
sd(Intercept)                      2631     1886
sd(parental_income)                1048     1628
cor(Intercept,parental_income)      348     1082

Regression Coefficients:
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept          51.74      1.99    47.84    55.71 1.00     5261     2955
parental_income     0.48      0.05     0.37     0.58 1.00     2121     1788

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     4.94      0.26     4.45     5.47 1.00     4369     2284

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
# converge
mcmc_trace(fit) 

# pp_check
pp_check(fit)

# plot
mcmc_areas(fit, pars = c("b_parental_income"))

Frequentist Regression

Interesting Point

Null sampling distribution is where we get our p-values, and critical values.

Interesting Point

Alternative sampling distribution is where we get our confidence intervals.

Common Hypothesis Tests

  • t-test

  • z-test

  • chi-square test

Z-test

z-statistic:

\[ z = \frac{\bar{x} - \mu_0}{se} \]

where \(\bar{x}\) is the observed sample statistic, \(\mu_0\) is the sample statistic expected under \(H_0\), and \(se\) is the sample standard error.

z-statistics measure how far away the sample statistic is from the expected \(H_0\) statistic. It’s a z-score using the null distribution \(\mathcal{N}(\mu_0, se)\)


❓when sample statistics are compatible with \(H_0\), what type of z-statistics would you expect to see?

Z-test

Because we divide by \(s\), z-scores are standardized, meaning that we can use the same standard cutoffs to create Neyman-Pearson rejection regions! z-statistics that are more extreme than \(z = \pm 1.645\) account for the 10% most extreme values. For the 5% most extreme we use \(z = \pm 1.96\).

Z-test

In a Fisherian or NHST framework, the z-distribution can also help us calculate p-values. Here \(p = 0.7\).

Z-test

Common z-tests:

  • z-tests for means (known \(\sigma\))

  • z-tests for proportions (believe it or not, \(\sigma\) also known)

Z-test: Example

# is the mean weight of graduate student hydroflasks different from 500g, hydroflasks have a known sd at Chapman of 150g

# h0: mu = 500
# ha: mu != 500

# by hand
(mean(dat) - 500)/(150/sqrt(length(dat)))
[1] 2.176504
# using alpha = 0.1, we reject the null! there is a stat sig difference in the weight of grad student waterbottles

t-test

t-statistic:

\[ t = \frac{\bar{x} - \mu_0}{se} \]

where \(\bar{x}\) is the observed sample statistic, \(\mu_0\) is the sample statistic expected under \(H_0\), and \(se\) is the standard error.

t-statistics also measure how far away the sample statistic is from the expected \(H_0\) statistic.


❓when sample statistics are compatible with \(H_0\), what type of t-statistics would you expect to see?

t-test

Because we divide by \(se\), t-scores are standardized, meaning that we can use the same standard cutoffs (depending on df) to create Neyman-Pearson rejection regions!

In a Fisherian or NHST framework, the z-distribution can also help us calculate p-values.

t-test

Common t-tests:

  • t-tests for mean (unknown \(\sigma\))

  • t-test for regression coefficients

  • t-test for difference between means

t-test: Example

# is the mean weight of bumblebees in florida different than the mean bumblebee weight of 175mg?

t.test(data,
       alternative = "two.sided",
       mu = 175)

    One Sample t-test

data:  data
t = -0.3004, df = 99, p-value = 0.7645
alternative hypothesis: true mean is not equal to 175
95 percent confidence interval:
 172.7395 176.6660
sample estimates:
mean of x 
 174.7028 

t-test: Example

library(TOSTER)

tsum_TOST(m1 = mean(data),
          mu = 175,
          sd1 = sd(data),
          n1 = length(data),
          low_eqbound = 170,
          high_eqbound = 180)

One-sample t-test

The equivalence test was significant, t(99) = 4.753, p = 3.4e-06
The null hypothesis test was non-significant, t(99) = -0.300, p = 7.65e-01
NHST: don't reject null significance hypothesis that the effect is equal to 175 
TOST: reject null equivalence hypothesis

TOST Results 
                 t df p.value
t-test     -0.3004 99   0.765
TOST Lower  4.7530 99 < 0.001
TOST Upper -5.3538 99 < 0.001

Effect Sizes 
           Estimate     SE                 C.I. Conf. Level
Raw         -0.2972 0.9894 [173.0599, 176.3456]         0.9
Hedges's g  17.5226 1.2431   [15.4236, 19.5332]         0.9
Note: SMD confidence intervals are an approximation. See vignette("SMD_calcs").

t-test: example

# is the difference in mean weights of bumblebees in florida and california different than 0?

t.test(x = data1,
       y = data2,
       alternative = "two.sided")

    Welch Two Sample t-test

data:  data1 and data2
t = -4.622, df = 195.18, p-value = 6.898e-06
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -8.118912 -3.262520
sample estimates:
mean of x mean of y 
 174.5142  180.2050 

t-test: Example

    city temperature  rainfall bee_weight
1 City 1    24.64579  933.8643   9.294917
2 City 1    22.91725 1049.5396   3.650119
# fit model
m1 <- lm(bee_weight ~ rainfall + temperature,
         data = simulated_data)

# regression table
summary(m1)

Call:
lm(formula = bee_weight ~ rainfall + temperature, data = simulated_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.0941 -1.3528 -0.0109  1.3792  8.6508 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  5.0597404  0.1251260   40.44   <2e-16 ***
rainfall    -0.0050555  0.0001271  -39.78   <2e-16 ***
temperature  0.2991710  0.0045176   66.22   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.999 on 9997 degrees of freedom
Multiple R-squared:  0.3112,    Adjusted R-squared:  0.311 
F-statistic:  2258 on 2 and 9997 DF,  p-value: < 2.2e-16

t-test: Example

    university      age   group gender caffeine_mg gestures
1 University 3 25.11077 Group B  Woman   151.92202       43
2 University 3 23.72048 Group B  Woman    74.42176       29
# does caffeine intake influence the number of gestures used in conversation?
library(lme4)

# z score
data <- data |> mutate(across(all_of(c("age", "caffeine_mg")),
                              ~ (.-mean(.)) / sd(.)))

m2 <- glmer(gestures ~ age + gender + caffeine_mg + (1 | university),
            data = data,
            family = poisson)

m2 |>summary()
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: poisson  ( log )
Formula: gestures ~ age + gender + caffeine_mg + (1 | university)
   Data: data

     AIC      BIC   logLik deviance df.resid 
   641.5    657.1   -314.7    629.5       94 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.42780 -0.57690  0.04138  0.61653  2.07566 

Random effects:
 Groups     Name        Variance  Std.Dev.
 university (Intercept) 0.0001464 0.0121  
Number of obs: 100, groups:  university, 5

Fixed effects:
                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)       3.72267    0.03285 113.340   <2e-16 ***
age              -0.02396    0.01694  -1.415   0.1572    
genderNon-binary -0.10289    0.04393  -2.342   0.0192 *  
genderWoman      -0.09640    0.04279  -2.253   0.0243 *  
caffeine_mg       0.04686    0.01654   2.834   0.0046 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) age    gndrN- gndrWm
age         -0.225                     
gndrNn-bnry -0.761  0.250              
genderWoman -0.777  0.237  0.617       
caffeine_mg  0.105  0.029 -0.146 -0.119

\(\chi^2\)-test

\(\chi^2\)-statistic:

\[ \chi^2 = \frac{(o-e)^2}{e}\\ \underbrace{\chi = \frac{(o-e)}{\sqrt{e}}}_\text{for demo purposes only} \]

where \(o\) is the observed sample statistic, \(e\) is the sample statistic expected under \(H_0\)

\(\chi^2\)-statistics also measure how far away the sample statistic is from the expected \(H_0\) statistic.


❓when sample statistics are compatible with \(H_0\), what type of \(\chi^2\)-statistics would you expect to see?

\(\chi^2\)-test

Common \(\chi^2\) tests:

  • Homogeneity: do two groups have the same distribution for one variable in the dataset?

  • Independence: are two variables in a dataset related?

  • Goodness of Fit: does one variable in the dataset match our expected distribution?

\(\chi^2\)-test

I am satisfied with the quality of my healthcare

Strong Agree Agree Neither Disagree Strong Disagree
Blue Shield 15 10 40 20 15

Test of GOF: does the distribution of responses match our goal distribution of 20/25/30/15/10%?

\(\chi^2\)-test

Strong Agree Agree Neither Disagree Strong Disagree
Blue Shield 15 10 40 20 15
EXPECTED Strong Agree Agree Neither Disagree Strong Disagree
Blue Shield 100 * 0.2 100 * 0.25 100 * 0.3 100*0.15 100*0.1

\(\chi^2\)-statistic: \(\chi^2 = \frac{(o-e)^2}{e}\)

\(\chi^2\)-test

chisq.test(x = c(15,10,40,20,15),
           p = c(0.2,0.25,0.3,0.15,0.1))

    Chi-squared test for given probabilities

data:  c(15, 10, 40, 20, 15)
X-squared = 17.75, df = 4, p-value = 0.001381

\(\chi^2\)-test

I am satisfied with the quality of my healthcare

Strong Agree Agree Neither Disagree Strong Disagree
Blue Shield 15 10 40 20 15
United Healthcare 10 5 20 40 25

Test of Homogeneity: Is the distribution of responses different for the two groups?

\(\chi^2\)-test

summary_data <- rbind(bs = c(15,10,40,20,15),
                      uh = c(10,5,20,40,25))
chisq.test(summary_data)

    Pearson's Chi-squared test

data:  summary_data
X-squared = 18.5, df = 4, p-value = 0.0009851

\(\chi^2\)-test: Example

# is the proportion of heads different than 0.5?? Do we have a fair coin?

prop.test(mean(dat), # prop heads
          length(dat), # num flips
          p = 0.5, alternative = "two.sided")

    1-sample proportions test with continuity correction

data:  mean(dat) out of length(dat), null probability 0.5
X-squared = 95.492, df = 1, p-value < 2.2e-16
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 4.761309e-05 5.674451e-02
sample estimates:
     p 
0.0064 

\(\chi^2\)-test: Example

# Set seed for reproducibility
set.seed(42)

# Define the expected color distribution (e.g., based on a typical Skittles bag)
expected_proportions <- c(red = 0.2, orange = 0.2, yellow = 0.2, green = 0.2, purple = 0.2)

# Simulate the actual observed counts of Skittles in a bag (e.g., a bag of 100 Skittles)
total_skittles <- 100
observed_counts <- rmultinom(1, total_skittles, prob = expected_proportions)[,1]

# print
print("Observed counts:")
[1] "Observed counts:"
print(observed_counts)
   red orange yellow  green purple 
    26     24     15     20     15 
# test
expected_counts <- total_skittles * expected_proportions
chisq_test <- chisq.test(x = observed_counts, p = expected_proportions)
print("Chi-square GOF test result:")
[1] "Chi-square GOF test result:"
print(chisq_test)

    Chi-squared test for given probabilities

data:  observed_counts
X-squared = 5.1, df = 4, p-value = 0.2772

\(\chi^2\)-test: Example

# Set seed for reproducibility
set.seed(123)

# Define the number of students in each group
n_eecs <- 100
n_cads <- 100

# Define the expected distributions of undergraduate majors 
prob_eecs <- c(math = 0.3, business = 0.1, humanities = 0.05, 
               computer_science = 0.4, data_science = 0.15)
prob_cads <- c(math = 0.2, business = 0.25, humanities = 0.1, 
               computer_science = 0.25, data_science = 0.2)

# Simulate data using multinomial distribution
observed_eecs <- rmultinom(1, n_eecs, prob = prob_eecs)[,1]
observed_cads <- rmultinom(1, n_cads, prob = prob_cads)[,1]

# Combine data into a table for the chi-square test
observed_data <- rbind(EECS = observed_eecs, CADS = observed_cads)
colnames(observed_data) <- c("Math", "Business", "Humanities", 
                             "Computer Science", "Data Science")

# print
print("Observed data for EECS and CADS students:")
[1] "Observed data for EECS and CADS students:"
print(observed_data)
     Math Business Humanities Computer Science Data Science
EECS   30        9          8               33           20
CADS   13       27         15               25           20
# test
chisq_test <- chisq.test(observed_data)
print("Chi-square test of homogeneity result:")
[1] "Chi-square test of homogeneity result:"
print(chisq_test)

    Pearson's Chi-squared test

data:  observed_data
X-squared = 18.955, df = 4, p-value = 0.0008022